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Abstract 

We revisit the Bayesian online inference problems for the linear dynamic systems (LDS) under non- 
Gaussian environment. The noises can naturally be non-Gaussian (skewed and/or heavy tailed) or to 
accommodate spurious observations, noises can be modeled as heavy tailed. However, at the cost of such 
noise robustness, the performance may degrade when such spurious observations are absent. Therefore, 
any inference engine should not only be robust to noise outlier, but also be adaptive to potentially 
unknown and time varying noise parameters; yet it should be scalable and easy to implement. 

To address them, we envisage here a new noise adaptive Rao-Blackwellized particle filter (RBPF), by 
leveraging a hierarchically Gaussian model as a proxy for any non-Gaussian (process or measurement) 
noise density. This leads to a conditionally linear Gaussian model (CLGM), that is tractable. However, this 
framework requires a valid transition kernel for the intractable state, targeted by the particle filter (PF). 
This is typically unknown. We outline how such kernel can be constructed provably, at least for certain 
classes encompassing many commonly occurring non-Gaussian noises, using auxiliary latent variable 
approach. The efficacy of this RBPF algorithm is demonstrated through numerical studies. 

Index Terms 

noise adaptive filter; Rao-Blackwellized particle filter; noise robust inference; Kalman filter; asym¬ 
metric noise; 


I. Introduction 

Many applications of interest to the signal processing community (e.g., tracking, autonomous navigation 
and surveillance) require that the inference must be performed on near real-time (online) using the 
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streaming sensor data. The stability and the performance of the underlying inference engines however 
depend crucially on the sensor data quality. Usually, any spurious sensor data needs to be detected and 
discarded before passing to the inference engine, so that the latter is not susceptible to permanent failure. 
In recent times, the sensors are becoming cheaper, however at the expense of their performance reliability. 
The proliferation of such inexpensive sensors has opened up the possibility to explore many complex and 
high dimensional problems (e.g., motion tracking, road traffic monitoring), which are hitherto impossible 
with a limited number of costly sensors. This trend for using inexpensive sensors has in turn, laid greater 
emphasis on the processing algorithms to the extent that any inference algorithm (presumably with more 
computing prowess) is required to be robust and stable against such spurious sensor data; yet simple to 
implement and vastly scalable to the high dimensional problems. Against this backdrop, we consider the 
online inference problems for the discrete time LDS. 

A. Problem Background 

Consider the following discrete time LDS relating the latent state Xk 6 M nx at time step k to the 
observation yk 6 M n " as 


Xk A fa Xk—\ + -Bfc U)k> 

(la) 

Vk = Cfc 

(lb) 


where Wk and e/. are the process and measurement noise respectively. The noises are assumed to be 
independent and also independent of each other. The model parameters {Ak, Bk,Ck} are considered to 
be known here. Given this model, an initial state prior (i.e., p(x o)) and a stream of observations up 
to time k, ijq - j - = {yo,yi, ■ ■ ■■ y /,. } , one typical inference task is to optimally estimate the sequence of 
(posterior) densities p{xk\y\ : k), in an online fashion over time. This is known as the online Bayesian 
filtering problem for the LDS, and the density p(xk\yi-.k ) is called as the filtering density. When the noises 
above are Gaussian, the filter density can be obtained recursively in closed form using the celebrated 
Kalman filter (KF) |[Tj]-|[3]]. However, this analytical tractability is lost if any noise deviates from such 
nominal Gaussian assumptions. 

In reality, many noise sources naturally appear to be non-Gaussian (characterized by their heavy 
tails and/or skewness). For example, noise with an impulsive nature (sharp spikes and/or occasional 
bursts) appears in many applications such as speech and audio signal processing, astrophysical imaging, 
underwater navigations, multi-user radar communications, kick detection in oil drilling, finance and 
insurance among others [4]—[ 11J- This impulsive nature can be modeled e.g., by a noise distribution 
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that has heavier tails than the Gaussian distribution. Heavy tailed distributions are also used to model the 
presence of so called outliers, which are data points, that do not appear to follow the pattern of the other 


data points 121. Data from the visual sensors, GPS devices, sonar, and radar are often contaminated by 
such outliers. The root causes of such outlying observations are often unknown or arc excluded deliberately 
from the model due to the complexity and the computational issues. So under such simplified modeling 
of complex real world processes, these unusual observations can be taken care by noise outliers. This in 


turn, requires heavy tailed distributions for the noises |13|. 

In the filtering context, when a nominal model with specified Gaussian noises cannot account for 
the outliers or sudden change in unknown input signals, the filter becomes unstable. Since for the real 
world applications, often we have weak knowledge on the systems and outliers are frequent, naturally, 
the filtering problems under heavy tailed noises have attracted considerable research attentions [14J-(T7j. 
We note that in the context of filtering, the process and/or the observation noise can have heavy tails. 
Although heavy tailed observation noise have received much attention, in applications like maneuvering 
target tracking, modeling occasional pilot induced changes requires a heavy tailed process noise |jT 8 j. In 


contrast, although skewness appears in many applications [ 19 1 , the associated online inference problem 
has not been well explored Q 

Non-Gaussian noise can also appear due to the modeling artifact. We illustrate the last point through 
the following example: 


1) Stochastic volatility model: We consider the following discrete time model [21 ] 

hk+i = 7o + lihk + Vk, 
y k = e k exp(h k /2 ), 


( 2 a) 

(2b) 


where h k and y k are the latent log-volatility and observed asset return at time step k with | 7 i| < 1 , 
h$ ~ J\f(0 , jz^sr), ij k /V(0, cr^) and jV(0,1), where iid means independent and identically 

distributed. 70,71 and o n are assumed to be known. Here r) k is uncorrelated to < 7 , i.e. no leverage 
effect is considered. Note that the measurement noise is multiplicative. The above dynamic model can 
equivalently be cast as a linear state space model: 


h k +1 = 7o + 7i hk + rjk , 
log(yl) = h k + log(e\). 


(3a) 

(3b) 


'One notable exception is the recent article 


20 


, that has been brought to our notice. 
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However the observation noise e k = log(e |), being log of a x 2 random variable, is no more Gaussian. 
In fact, the density of this noise is available analytically, given by (see e.g., 1221) 

f(e k ) = =exp{ - 0.5 (exp(e k ) - e k )}, -oo < e k < oo. (4) 

v Z7T 

From Figure ([!]), it is evident that the density is highly skewed. Thus, we have a LDS driven by a 



Figure 1: Density plot for the observation noise in Q. 


non-Gaussian (measurement) noise. 


While the noise distribution is certainly a key factor for the inference task, the other equally im¬ 
portant aspect for practical considerations is the knowledge on the noise statistics. In many practical 
contexts, although prior knowledge on the noise distributions is fairly available, the corresponding noise 
parameters are often unknown. Thus additionally, the noise parameters need to be estimated on the fly 
as well. This is known as online noise adaptive filtering. When the noise parameters are stationary, the 
online Bayesian estimation of the parameters is known to be difficult and one practical solution is to 


assume the noise parameters to be slowly varying in time [23 ]. However, this assumption easily breaks 
down in the presence of any potential outlier. To get rid of the outliers, some mechanisms are usually 
placed to detect and immediately discard them. However such outlying observations may carry impor¬ 
tant information about e.g., any unmodeled system characteristics and model degradation and as such, 
as pointed out in j24j, ’’Routinely ignoring unusual observations is neither wise 
nor statistically sound”. On the other hand, eventhough such outliers can be accommodated 
by using e.g., a properly specified heavy tailed noise distribution from a known parametric family, the 
performance of the filter can be degraded severely when the outliers are absent. This happens due to 
the use of fixed form of distribution. To alleviate this problem in noise adaptive filtering, essentially 
what we need is to use a heavy tailed noise, whose parameters are time varying. However, coping with 
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nonstationary noise parameters in online setting is very challenging as the corresponding dynamics for 
the parameters cannot be easily specified a priori in practice. 


B. Prior works 

There is a long history on the efforts of improving the KF algorithm under the non-Gaussian environ¬ 
ments. The earlier attempts were mainly based on the robustification arguments in the presence of outliers. 
These were primarily addressed either by analytical approximations using elliptical noise distributions or 
by heuristic cost functions in the update step of KF. The elliptical noise based approaches | [25| , | [26| were 
not robust, as the posterior mean is unbounded as a function of the residuals, whereas the approaches based 
on the ad-hoc cost functions (e.g., [27j, ff28l) requires tuning of parameters and their implementations 


were very involved as compared to KF. Later, the simulation based approaches (e.g., PF) [24], [29] became 
popular due to their ease of implementations. Flowever, the major disadvantages were their non-scalability 


together with computational costs [30]. Several recent articles have addressed the filtering problem using 


variational Bayes (VB) [14], | T5J, |T7| and convex optimization based methods | j3T[ . Although VB 
method is quite scalable, in general, it requires fairly involved mathematical derivations and is known to 
consistently underestimate the posterior covariance. Common to all these recent studies is the assumption 
of outliers in the observation noise only. The methods by (BJ, (T7J, [j3TJ do not accommodate any 
persistence (time correlated) noise outliers. Moreover, [15] considers a heuristic transition model for the 


noise parameter (as observed by 141, the filter is not stable under abrupt change in noise) whereas 


JT4|, pT| require additional input parameter from the user. More recently, observing that the VB based 


algorithms are very sensitive to process noise outlier, [16| proposed an analytical approximation based 
t filter, where both the process and observation noises are Students’ t. The approximations here require 
that the estimated state and process/observation noise is jointly t with a common degree of freedom (dof) 
parameter. To maintain these hypotheses and to prevent the growth of dof parameters in filter recursion, 
again careful tuning is required at each step. For all the cases, the noise class is implicitly assumed to 
be symmetric and Students’ t. 

From the literature above, we see that many of the existing approaches require manual tuning of the 
parameters and also their implementations are substantially involved as compared to well understood KF. 
Thus any future inference framework should ideally be free of such heuristic tuning and the implementa¬ 
tion should be simple to understand. Also, any future framework should be able to handle-(a) asymmetric 
noise and (b) symmetric heavy tailed noise other than Students’ t. 
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C. Sketch of our contributions 

Keeping in view the points above, we consider the online inference for the LDS under a fairly general 
and realistic scenario. This is addressed here in two stages by successively dropping the following common 
assumptions: (a) noises arc Gaussian and (b) noise parameters are known. 

In the first stage, we consider the LDS under (known) stationary non-Gaussian environment. The 
corresponding inference task is analytically intractable. In our approach, we envisage a hierarchically 
Gaussian model (HGM) as a proxy for any non-Gaussian noise. This model can represent the skewness 
and/or heavy tail that are typical characteristics of the non-Gaussian noise. Such representation allows us 
to exploit a CLGM, which is analytically tractable using a KF. This in turn, leads to a RBPF framework 
for the online inference task. Since within the RBPF framework, PF is confined to a space of lower 
dimension, it acts as an enabler for scaling to high dimensional problems. The proposed framework here 
uses a bank of KFs, which is simple to understand; the sophistication comes in the way how we mix 
and propagate the output of different KFs to get the target filter density. Although this RBPF framework 


is not entirely new 321, a proper specification of the transition kernel for the state, targeted by PF is 
often difficult and this technical aspect has received little attention. We elaborate this issue in details and 
address the proper transition kernel for certain class of noises (where widely used Students’ t is a special 
case). 

For the second stage, we point how this RBPF framework is already doing a noise adaptive filtering. 
Flowever to make the filter robust against any large noise deviation, we need prior knowledge about such 
deviation. Often this is reasonably well known for many practical applications. For such cases, we outline 
how this information can be encoded in the noise prior. This completes our noise robust online inference 
framework. 


D. Organization of the article 

The rest of the article is organized as follows. We start with brief descriptions on F1GM in section |TI] and 


simulation based online filtering in section III This is then followed by the development of the proposed 


inference framework for LDS under stationary non-Gaussian environment in section IV We then describe 
how the above framework can be used for robust noise adaptive filtering in section [Vj Subsequently, the 
numerical experiments are shown in VI which is followed by concluding remarks in section |VII| 
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II. Hierarchical Gaussian model (HGM) for non-Gaussian density 

Non-Gaussian densities are often characterized by the presence of heavy tail and/or skewness. Many 
such densities can often be modeled as hierarchically Gaussian. That is, the density can be represented 
as Gaussian, conditionally on an auxiliary random variable, known as the 'mixing parameter’. We outline 
below a brief description for such hierarchical Gaussian representation. For the clarity of the presentation, 
we consider two separate classes based on the symmetry of underlying density. When the density is 
symmetric, HGM is represented as scale mixture of normal (SMN), also known as normal variance 
mixture (NVM). For the non symmetric density, it is represented as normal variance-mean mixture 
(NVMM) model. 


A. Scale mixtures of normal 

A random vector X has a scale mixtures of normal density, if it can be expressed as follows 

X = n + K2 (A)Z, (5) 


where /r is a location vector, Z is distributed according to a zero mean multivariate normal with covariance 
matrix £ and k is a positive weight function. A is a random variable, known as mixing parameter 
and is distributed on the positive real line, independent of Z. Note here that conditioned on A = A, 
X follows a multivariate normal distribution with mean vector // and covariance matrix n (\)£, i.e., 
(*|A = A) N{x\n, k(A )£). Then, the probability density function (pdf) of X can be expressed as 


p(x) = / Af(x\fi,K(\)T,)p(\)d\, 


( 6 ) 


where p{ A) is the density function of A. p( A) is referred to as the mixing density of SMN representation. 

The symmetrical class of SMN includes among others, the popular Students’ t, the Pearson type VII 
family, the Slash and the variance gamma distribution^] All these distributions are characterized by their 
heavy tails as compared to the normal distribution. Although there exists many other important SMN 
distributions (e.g., exponential power, symmetric a -stable, logistic, horse shoe, symmetric generalized 

, in the sequel, we present only the above mentioned special cases, as 


hyperbolic distribution) [34|, 
the associated mixing densities have computationally attractive form. 


2 The Gaussian distribution in this context, can be thought as a degenerate mixture 
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1) Multivariate Students’ t distribution: When X follows a multivariate Students’ t distribution with 
location p, scale E and degrees of freedom u (i.e., X ~ Tip, E; z/)), the pdf of X can be expressed in 
the following SMN form 

fOO v 7/7/ 

p{x) = j N{x\p,-)Ga{X\-,-)dX, (7) 

where Ga(-\a,b) is the gamma density function of the form 

Go(A|a, b) = A“ _1 e _6A , A, a, b > 0, (8) 

r(a) 

and T(a) is the gamma function with argument a > 0. Consequently, X can be presented in the following 
hierarchical form 

X|(/i,E,i/,A)~7V(/x,A- 1 E), A| v ~Ga(~). (9) 


2 ) Pearson type VII distribution: If X belongs to the Pearson type VII family, the associated density 
is given by 

/ , v « 1 1 1 , ( x - V) 2 \-(S+i)/2 

M^,S) = +-^-) , ( 10 ) 

where p and E arc the location and scale parameters, v > 0 and b > 0 are the shape parameters and 
B(a,b) is the beta function with arguments a > 0 and b > 0. The Pearson type VII density can be 
expressed hierarchically as 


X\(p, E, i/, <5, A) ~ A f(p, A _1 S), \\(v, 6) ~ Ga(^ 5 ~). 

We get back to Students’t distribution when v = 5 and Cauchy if v = b = 1. 

3) Multivariate slash distribution: The Slash distribution can be hierarchically represented as 


( 11 ) 


X\(p, S, u) ~ A f(p, A _1 E), X\u ~ Be{y, 1), 


( 12 ) 


with 0 < A < 1 and v > 0, where Be(-) denotes the Beta distribution. 

4) Multivariate variance gamma distribution: When X follows a multivariate variance gamma (VG) 
distribution with location p, scale E and degrees of freedom u > 0 (i.e., X ~ VQ(p, E; i/)), the density 
can be represented in the following hierarchical form 

X\(p,E,u,X)^Af(p,X~ 1 S:), X\V~IG(~), (13) 

where IG(a,b) is the inverse gamma density given by 

IG(X\a,b) = -^-X-^e~ b /\ (14) 

T(a) 

When v = 2, VG turns into a Laplace distribution. 
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B. Normal variance-mean mixture 


A random vector X is following a normal variance-mean mixture distribution, if it can be expressed 


as follows [361 


X = ii+ (3 A + VAZ, 


(15) 


where A is independent of Z and Z is distributed according to a zero mean multivariate normal with 
covariance matrix E. The random variable A is the mixing parameter and is distributed on the positive 
real line and p, (3 e R. The distribution of X conditioned on A = A is multivariate normal given as 
(X|A = A) ~ N(x\p + /3A, EA). Note that when (3 = 0, the NVMM turns into a SMN with rc(A) = A. 
We describe below some special cases of this class of distributions: 

1) Generalized hyperbolic distribution: X in ( p3| ) follows a generalized hyperbolic (GH) distribution, 
if A is distributed according to a generalized inverse Gaussian (GIG) distribution as 


( a/b) p ( 2 ! aA + 6/A 

p{ A) = ’ X p l exp{ -——}, A > 0, 


(16) 


2 K p (Vab)" ~ 2 

where K p is the modified Bessel function of the second kind, a, b E R + , and p £ R. 

2) GH skew Students’ t distribution: The hierarchical structure of GH skew Students’ t distribution is 
given as hzj 

(X| A = A) ~ J\f(x\p + /3A, EA), X\u ~ IG(v/2, v/2), (17) 


where inverse-gamma density is given by ( [T4| ). Note that the GH skew Students’ t is a special case of 
the GH distribution, where the parameters for the GIG are selected as a = — v/2 (u > 0), b = \/b and 
p = 0. Moreover, when f3 = 0, this turns to a symmetric Students’ t distribution and if u —> oo, this 
becomes a skew normal distribution. 

3) GH variance gamma distribution: The hierarchical structure of GH variance gamma distribution is 
given as 

(X \A = A) ~ Af(x\p + j3X, EA), \\u ~ Ga( i//2, u/2), (18) 


where the gamma density is given by ([8]). 

Remark 1: note that the mixing parameter A in II-A and |II-B| being a random variable, dimension of 
A is always less than or equal to that of random vector X. This observation is important, as we will see 
later that the PF can be used to target the low dimensional A in a RBPF framework. 
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III. Simulation based online filtering 

When a dynamic system (state space model) is nonlinear and/or driven by non-Gaussian noises, the 
filter density p(x k \yi: k ) is in general, analytically intractable. To deal with, many approximated methods 
have been proposed over time |38]. Particle filtering (PF) is one such method, which uses Monte Carlo 
simulations to address the filtering problem. In this section, we give a very brief overview of PF and 
RBPF. 


A. Particle filtering (PF) 

In PF, the posterior distribution associated with the density p(xo :k \yi :k ) i s approximated by an empirical 
distribution induced by a set of A r (3> 1) weighted particles (samples) as 


N 


PN(dx 0 - k \y 1:k ) = ( dxo-.k ); > 0, 


(19) 


i= 1 


where 5 x a)(A) is a Dirac measure for a given x'^j,, and a measurable set A, and w"^ is the asso¬ 
ciated weight attached to each particle x^ k , such that YliLi '^' k = 1- Even though the distribution 
P/v(dxo:fc|yi:fc) does not admit a well defined density with respect to the Lebesgue measure, we use 
notational abuse to represent the associated empirical density as 


N 


PN(XQ:k\yi:k) = ^ S(x 0:k ~ X^). 


„(*) 


( 20 ) 


i =1 


Although ( [20] ) is not mathematically rigorous, it is intuitively easier to follow than the stringent measure 
theoretic notations, especially if we are not concerned with the theoretical convergence studies. 

Note that the posterior p(xo :k \yi: k ) , which we target using a PF, is unknown. The empirical distri¬ 
bution Pjv(dxo : k\yi:k) i n ( P~9| ) is obtained by first generating samples x^) k from a proposal distribution 
tt{xQ-.k\yi\k) and then the corresponding weights are obtained using the idea of importance sampling as 


,(0 _ 


p{x { S k \yi-. k ) 


fii) _ 


W 




Ef,i A'' ’ 


( 21 ) 


Given this PF output, one can now approximate the marginal distribution p{x k \yi- k ) as PN(dx k \yi :k ) = 


YliLi (dx k ). Suppose at time (k — 1), we have a weighted particle approximation of the posterior 

p(x 0:k -i\yi- k -i) as P N (dx 0 :k-i\yi:k-i) = (d/X(y k — i). Now with a new measurement 

yk, we wish to approximate p{xo- k \yi :k ) with a new set of particles (samples). A standard PF uses the 
following posterior path-space recursion 




PO0:fc|2/l:fc) OC p{yk\xO:k,yi:k-l)p{xk\xO:k-l,yi:k-l)p{x 0 :k-l\yi:k-l)- 


( 22 ) 
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Now for the Markovian state space model, this becomes 

P(xo-.k\yi:k) P(Vk\Xk) P(Xk\Xk-l) p{x 0 :k-l\yi:k-l)- 


(23) 


Assuming that the proposal distribution can be decomposed as 7r(aio:fc|2/i:fc) = tt(^k\ x O:k-ii 2 /i:fc) 7 r ( a; 0 :fc-i| 2 /i:fc-i)j 
we can now implement a sequential importance sampling, where the particles arc propagated to time k 
by sampling a new state from the marginal proposal kernel 7r(.x/,-|.XQ i j. ( , y\ : /.) and setting x[*|. = 


„(*) 


(*) 


'0:fc-l> X k 


. Subsequently using ([23]), the corresponding weights of the particles can be given by 


,(*) 


p(yfei4 l) M4 l) i4-t)~w 
- K = 


w 


(0 


( IIJI III \ .V J- ,» v-^iV 

n x k l* 0 :fc-H l^j=l W k 


0 V 


(24) 


To avoid carrying trajectories with small weights and to concentrate upon the ones with large weights, 
the particles need to be resampled regularly. The effective sample size N e ff, a measure of how many 
particles that actually contributes to the approximation of the distribution, is often used to decide when 
to resample. When N e ff drops below a specified threshold, resampling is performed. Many efficient 
resampling schemes have been proposed in the literature. Instead of going into the details, we refer the 
interested readers to [ 241, 1291, [39j-[41] for a more general introduction to PF. 

Remark 2: If one uses the state transition density as proposal with resampling at each step, the 
corresponding PF is known as bootstrap particle filter. This is easy to implement, so very popular 
in practice. 


B. Rao-Blackwellized particle filtering (RBPF) 

Although PF is very popular and has been around for a while, it is computationally demanding and 


notably, it has severe limitations when scaling to higher dimensions [301. For certain models, when part 
of the state space is (conditionally) tractable, it is then sufficient to employ a PF for the remaining 
intractable part of the state space. If it is possible to exploit such analytical substructure, the Monte 
Carlo based estimation is then confined to a space of lower dimension. As a consequence, the estimate 
obtained is often better (in terms of asymptotic variance) and never worse than the estimate provided 
by the PF targeting the full state space. The resulting method is popularly known as Rao-Blackwellized 
particle filtering |32|, [42)—[45). Note that solving part of the state vector analytically (or using analytical 


approximations) leaves the remaining part to be targeted only by PF. Thus RBPF acts as a practical enabler 
for scaling to high dimensional problems. 
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IV. Inference for LDS under stationary non-Gaussian environment 

Consider the LDS in ([!]), driven by known stationary non-Gaussian noises, assumed to be hierarchically 
Gaussian as in Section [jlj Given the model and assuming that the initial prior, p(x o) is a known Gaussian 
densi tyj^j our objective is to recursively target the intractable filtering density p{x k \yi-.k)- 

Let at time step k, Aj' ! and X k be the corresponding mixing parameters for and e/, : . We define an 
auxiliary vector A/, = ( Ajf , \%) T ■ Then the noises w k and e k are hierarchically Gaussian as 

^fclAfc ~ AA(/i^(A fc ),<3fc(A fc )), (25a) 

e k \X k ~ -A/" (Atfe(Afc), -Rfc(Afe)), (25b) 

where p%, pf are the corresponding mean and Q k and R k are the corresponding covariance of the 
hierarchical Gaussian noise^j] These parameters can possibly depend upon A/.. Such a noise representation 
admits a CLGM, which is analytically tractable using the KF. This opens up a RBPF implementation for 
the online inference problem. In the sequel, we describe this RBPF framework. This section is organized 
as follows. We first outline one complete iteration of the proposed RBPF framework, which is followed 
by the specification of the dynamics for the state (i.e., mixing parameters), targeted by associated PF. 
Next, an algorithmic description of the approach is provided. This is then followed by the descriptions 
on the likelihood function estimation and p-step ahead prediction. 


A. Posterior propagation cycle of RBPF 


At time zero, p(x o) is known. Suppose at time step (/;• — 1), we have the joint target distribution 

p(x k -i, Ai : fc_i|yi;fc_i). This can be decomposed as 

( %k— 1 } 

P X 

where p{\\-.k-i\yi:k-i) is targeted by a PF and is empirically given by a set of N{» 1) weighted 
random particles as 

N 

P(M:k- l\yi:k- 1 ) = - A^-J, (27) 

1=1 


yi:k-l =p Xk-1 


Al:fc—1, 
yi-.k-i 


P\ Xl : k-l\yi:k-l , 


(26) 


’More generally, p(x o) can be taken to be a hierarchically Gaussian density. 

4 Instead of both the noises being non-Gaussian, if one of them is Gaussian, the inference framework, described subsequently, 
is still valid. In that case \k becomes an auxiliary random variable, representing the mixing parameter of the non-Gaussian 
noise. 
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with w^_ j > 0 and Ylu=i w k -1 = 1- Using ( [27] ), we have 

N 

p(h-i\yi:k-i) = 


(28) 


Z— 1 


Noting that, given a sequence A^] c _ 1 , the dynamic system now becomes a CLGM. So p{xk-i\ 1 , yi-.k-i) 


can be obtained by a KF, given by 

p(x k -i\x[^ k _ 1 iyi:k-i) = Af(x k -i(i),Pk-i(i)), 


(29) 


(z) 

where for notational clarity, we suppress the dependency of the parameters on A^; fc _ 1 . Since a KF is 
running for each sequence (henceforth denoted by an index i), we have total N number of KFs running 


in parallel at any given time. Now using (27 > and ([29]) together, the filter distribution p(xk-i\yi : k-i) can 
be obtained as 


P(Xk-l\yi:k-l) = P\ ®fe-l 


X p\ Ai : fe_i 

N 


Al:fe-1, 

yi-.k-i 

yi-.k-l dAi:fc_i, 


i>2 w k-i A/'^x fc _i(i),P fc _i(i)V 

i=i A J 


(30) 


which is a weighted (finite) mixture of N Gaussian distributions. The mean and covariance of this filter 
distribution (assuming them to be finite) can be given as 


N 


Xk -1 = 


i=l 

N 


Pk-i = ~ ®fc-i(*))(• ) 1 }> 


(31a) 


(31b) 


i=l 


where (A)(-j r is a shorthand for AA T . Now having observed y/., we would like to propagate the joint 
posterior in ([26]) to time k, i.e., 


P(Xk-l, \\,k-l\yi,k-l) p{Xk,Al:k\yi:k)- 

This can be achieved in the following steps ((l)-(4)) as described below: 


(32) 


1) PFprediction step: Generate N new samples a[’ 7 from appropriately chosen proposal 7r(A/, : | A' | , .j, ^, y\-±)- 

Then set A^ = {A^*]L_ 1? A^}, for i = 1, • • • ,N, representing the particle trajectories up to time k. For 

proposal selection, see Remark [4] below. 
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2) KF prediction step: Since the prior p(xk-i\X^} k _ 1 ,yi:k-i) for this step is Gaussian (given by 

( [29] )) and the noises are also Gaussian given A^ , (see ([25])), the dynamic system is now linear-Gaussian 

conditioned on ASo, using KF, the predictive distribution can be obtained analytically, which is also 
Gaussian and given as p(x k \X^ k , yi :k -i) = N{x k \k-i(i), ^fc|fc-iW), where 

x k \k-i(i) = A k x k _i (i) + B k p%(i) (33) 

-ffc|fc-i(*) = A k P k _ i(i) A k + B k Q k (i ) B k . (34) 

3) KF update step: Suppose we have now the new observation y k . We can now update to the posterior 
distribution p(x k \X^l, y\ :k ), which is also Gaussian due to the CLGM, denoted as 

p(x k \X^ k ,yi: k ) = Af(x k (i),P k (i)). (35) 

The parameters x k (i), P k {i) can be obtained from the KF recursively, using the following steps: 

®fc(*) = (*) + 

K k {i){y k - C k x k \ k _i(i) - p%(i)} (36) 

p k{i) = Pk\k-i(i) - K k{i) c k P k \ k -i{i), (37) 

where 

K k (i) = P k \ k -i(i) Cl S^(i) (38) 

S k (i) = C k P^ii) Cl + R k {i). (39) 

Moreover, the marginal likelihood is also obtained in closed form, which is also Gaussian and is given 
by 

p{yk\>^£ k ,yi-. k -i)=N{p k {i), e£(*)), (40) 

where 

tiki}) = C k x fc | fc _t(t) + /*!(*) (41a) 

Z£(i) = S k (i). (41b) 

4) PF update step: Now given the observation y k and the particles {Aj' j, }W ], we need to update to 
the posterior 

N 

p(Ai:fclyi:fc) = 'y' j w^5(\i:k - A^ fc ); > 0, (42) 
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with J2iLi w k’ = 1. The corresponding weight wY’ can be obtained (using ((22])) as 


dfi _ 


(*) 


.,(0 


~(i) 
w k = 


p(yk\>^ii,yi:k-i)p(^\^i’ k - V yi:k-i) „($ 


(t)i \(0 




w 


k- 1> 


W 


(0 


Eli wl 


(43a) 


(43b) 


The density p{yk\^{\, yi-.k-i) is already obtained in ( |40| >. For now we assume that yi-.k-i) 


(t)i \(0 


is specified, which we discuss further in |IV-B 


This completes one propagation cycle. To propagate to the next cycle, we first resample the particles 
A|![ : (and also the associated [x}■[%). Pkft) }) whenever necessary, before following again the steps 

(l)-(4). 


Remark 3: so long we can directly generate samples from Wk and e/ ;; , we can in principle, target the 
same inference task e.g., using a PF. Flowever, when the state space is high dimensional, PF is well 
known to be inefficient. On the other hand, in our RBPF framework, PF targets only low dimensional 
state (auxiliary mixing vector, A/.); conditionally the remaining state vector arc treated using KF. As a 
result, RBPF can scale well with the dimensions here. 


B. Specification of dynamics for Xk 

As PF is essentially designed for the dynamically evolving model, we need to specify a proper transition 
density p(Afc|Ai : fc_i, yi : fe_i), describing the time evolution of the path space Ai : /, : . Flowever this transition 
density is typically unknown and the only information we have is that A/, is distributed according to a 
(known) stationary density (say, p*). We note that earlier in the context of symmetric a-stable noise, | |46| , 
|47| treated Xk as unknown parameter with p(\k\^i-.k-i,yi:k-i) = p(Afc) = p* (i.e., A& are generated 
independently over k). In the PF context, this is completely arbitrary and fundamentally weak, as path 
space based recursion (see e.g., ( |43a[ )) requires a transition kernel, which links any newly generated 
particle to its ancestral lineage. To our knowledge, a provable state transition density in this context is 
not addressed adequately in the literature. 

In our approach, we specify the transition density through a Markov kernel such that the samples 
generated from this kernel constitute a Markov chain with the invariant distribution as marginal; since 
this invariant distribution is already known, we initialize the chain with this invariant distribution. In doing 


so, we follow [481, where the authors construct an AR(1) process using the auxiliary latent variables, such 
that the resulting time series is stationary with known marginals. Moreover, as shown in the original article, 
the time series model follows simple auto-correlation function, which decays exponentially. This property 
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is in fact very beneficial as it can limit the dependency of the sufficient statistics of p(xk\Xi : k,yi-.k) on 
the particle path space. 


As shown in [481, based on the known invariant marginal, one can construct such a transition kernel 
easily, when p( A&) belongs to infinitely divisible convolution-closed exponential family and a family of 
density functions that includes among others, (inverse) gamma, normal and beta densities. To keep the 
description simple, rather than going into the further details, we describe below one example to illustrate 
the idea. For more details, the reader may refer to [ |48| . 

1) Example: note from II-A4| if X is distributed according to a multivariate variance gamma, then 
the marginal p( A) is following an inverse gamma given by (fT3j)-(fT4l). Suppose we are interested in 


constructing p(Xk\Xk-i)- This can be implicitly obtained using the following relation [481 

v/2 + Uk Xk~i 


Xk 


Vk 


(44) 


where Uk ~ Ga(a. 1) and 14 ~ Ga{v /2 + a, 1), which are independent and also independent of each 
other. We also have 

E(Xk\Xk-i) = p{l — p) + pXk-i, (45) 

where p = is the autocorrelation functiorj^] and // = is the mean of the marginal distribution. 

Note that E(Xk\Xk~i) exists for u > 2 and the autocorrelation function of A/,, is p(r) = p T . Thus, the 
transition kernel can be specified by a first order AR model with inverse-gamma marginals. The unknown 
parameter a (associated with Uk and 14 ) is specified here through the selected value of p. 


Remark 4: The optimal proposal [49] given by 7r(Afc|Ai : fc_i, yi-k) = 7r(Afc|A^_i, yk) is often unavailable 
in practice. One simple alternative is to implement a bootstrap PF with transition kernel 7r(|Afc—i) as 
the proposal. 


C. Algorithmic summary 

A bootstrap implementation of the proposed RBPF for the LDS with hierarchical Gaussian noises are 
summarized in Algorithm [T] 

D. Estimation of the likelihood function 

The likelihood function is the joint density of the observation pi, - ■ ■ , yr • which can be decomposed as 

T 

p(yi,--- ,Vt) = n^l^-t), (46) 

k= 1 


^The autocorrelation function is restricted to be positive (i.e., p £ (0,1)) for such construction. 
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Algorithm 1 RBPF for LDS 
Set HGM for process and/or measurement noises 

- p(X k ) is set accordingly 

For each particle i = N do 

Initialization: 

• set p(xq^) for each KF 

Iterations: 

For k = 1,2,... do 

1) PF prediction step 

fit 

• sample X k according to section 

-if k = 1, ~ p{ X k ) 

- else, ~p(AjfclA^lj) 

. Qpt \W A (\(i) 

• Set \:k ~ ) 

2) KF prediction step 

• set (Eq- @) 

. compute x k \ k _ ± {i) and P k \ k -i(i) (Eq. ©-((34)) 

3) KF update step 

• compute x k (i) and P k {i) (Eq. (f36l)-([39|>) 

. compute p{y k \X±l,yi: k -i) (Eq. (|40)-(|4T)) 

4) PF update step 

• weight update 

- W oc p(y k \X^ k , yi.k-i) 

- normalize weights 

• resample the particles 

- let resampled particle indices be 3te{ !,••• ,N} 

- set x k {i) = x k (ji ) and P k (i) = P k (ji) 

- Set A i:fc = A S:fc 

- move to the next step 


IV-B 
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where for k = 1, p(yk\yi:k-i) i s interpreted as p(yi\0) = p(yi). The density p(yk\yi-.k-i) can be obtained 
by marginalizing \\. k in (|40|) using (|42|) as 


N 


p(yk\yi-.k-i) ~ Efc(*)). 


(47) 


2=1 


E. p-step ahead prediction 

Suppose at time step k, we have the observations { 2 / 1 , - - - , yp } and the corresponding filter density 
p(xk\yi:k)- The p-step ahead prediction of the state can now be obtained as follows: 


p{xk+p\yi:k) = P(x k+p \yi:kAl:k) p(^l-.k\yi:k)d\l:k 


N 


J2^P(xk + M:k,^ 


(O' 


k>■ 


2=1 


Now p(xk+ p \yi:k, A^) can be obtained using p-step ahead KF prediction as 


p(xk+ P \yi:kAii) =N{p 


(0 y(0 'i 

k+p\k : z ~ l k+p\k) 


with 


t l k+ p \k ~ ^ > fe+l+p,fc+i x k{i) + ^ ^ *bfc+i+p,fc+i+j Bk+j dk+j(d)i 

3 = 1 


(48) 


(49) 


(50) 


^i+p|fc = ^fc+l+p,fc+l Pk (*) (^fc+l+p^+l) + 

p 

^ ^ ^fc+l+Pjfc+l+j Bk+jQk+j{t){Bk+j') (^fe+l+p,fe+l+j) 1 (51) 

3 =1 

where we use the notation = A k -iA k -2 ■ • • Ai (k > l) and ^ k ,k = I [31. 


V. Robust noise adaptive filtering for LDS 

Although, the distribution of the noise is assumed to be stationary and known in the previous section, 
it is important to observe that the proposed algorithm is already doing a noise adaptive filtering. To see 


this, first note from ( [28] ) that we are also estimating A k sequentially over time. Now moving from time 
(k — 1) to k, when we generate N new samples {A^ }^ 1; the empirical distribution associated with 
p(Xi:k) is given by the particle cloud {A^ fc , (1/iV)}^. Since the noise in ([25]) at time k is assumed to 
be hierarchically Gaussian, we can now represent the density for the noise (say Vk) as 


N 


2=1 


(<)• 


k (0 


p(v k ) » — ^ ACJ v k \p(X { l’ k ), S(A^)), 


(52) 
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which is an equally weighted finite (random) Gaussian mixture (with component weight = 1/N). This 
serves as our prior for the (unknown) noise. When the new measurement ;///. arrives, we update to noise 
posterior using the Bayes rule. The posterior now consists of the same random mixture components, 
but the component weights are adjusted according to the observation likelihood (i.e. p(yfc|A^L yi-.k-i))- 
This construction ensures that we have a non-Gaussian noise adaptive filter. However, the stability of 
this filter is still sensitive to any potential large noise that cannot be accounted by the selected prior. In 
practice, the robustification for this noise adaptive filter is done through a flat prior selection. This ensures 
that the resulting filter is robust to outliers, yet at the same time due to its noise adaptive behavior, the 
performance does not degrade when the outliers are absent. 

VI. Numerical studies 

We start here with the inference problem when the measurement noise is skewed but its distribution is 
known. This is illustrated for a stochastic volatility problem (finance). Next we consider the cases where 
noise parameters are unknown and time varying. Here we test the proposed robust noise adaptive filter 
on a time series problem for the following cases: (a) unknown measurement noise and (b) both process 
and measurement noises are unknown. Finally we consider a maneuvering target tracking example and 
compare the proposed approach with the interacting multiple model (IMM) algorithm. 


A. LDS with known stationary non-Gaussian noise 

We consider the online filtering problem for the volatility example defined by (|3a|)-(|3b|). This example 
illustrates the strength of the proposed framework in terms of treating a skewed noise (although we note 
that PF is more appropriate for this scalar problem when computational cost is considered). As is evident 
from Figure (jT[), the observation noise is highly skewed. In the literature, this has been approximated 


e.g., by a mixture of seven Gaussian components by equating the first four moments [501. Here we 
approximate this noise density using a GH skew Student’s t distribution with // = 1.75, B = —2.3, 


£ = 1 and v = 5.8. Following [21], we select 70 = 0, 71 = 0.9 and a ^ = 0.1 and using generate 
a (simulated) return data ;///,. for 1000 time steps. Next, using the simulated data, we estimate the filter 
distribution by AlgorithnjT] We observed that the algorithm is handling well the sequential estimation 
task. The simulated data, the (approximated) observation noise density and a typical realization for the 
filter mean estimates are plotted in Figure (j2j). 
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Figure 2: Stochastic volatility example: (top) simulated return data y k , (middle) observation noise density 
and its GH skew Student’s t approximation, (bottom) the true log-volatility and one realization of the 
filter mean estimates. 
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B. Time varying and unknown noise parameters 

Here we start with the case of unknown measurement noise, which is then followed by the case, where 
both the process noises and measurement noises are unknown. 

1) Measurement noise is unknown: we consider the following time series problem given by 

s k = 1.51s fc _i - 0.55s fc _2 + w k , (53a) 

Vk = s k + e k , (53b) 

where the signal s k is evolving as a second order auto-regressive process, with w k ~ 2/(0, 1). The 
distribution for the measurement noise e k is assumed to be unknown. The simulated (synthetic) mea¬ 
surements y k are generated using the mixture distribution e k ~ 0.952/(0.10) + 0.Q5A r (0,100). For the 
robust noise adaptive filter, the prior for unknown e k is taken as a zero mean Student’s t with E = 10 4 
and a = 5. A typical realization of the filter mean estimate is shown in Figure Q. One can observe 
that the measurements contain outliers that are appearing sporadically (statistically independent from one 
another), but the filter is doing a really good job in estimating the hidden signals. 

But there may be situation where the outliers can be persistence (time correlated). For example, noise 
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Figure 3: When measurement noises containing 
sporadic outliers; true signal measure¬ 

ments (’—o’) and mean estimates of the signal 



Figure 4: When measurement noises containing 
persistent outliers; true signal measure¬ 

ments (’—o’) and mean estimates of the signal 
(’-■’)■ 


may temporarily switch to a different regime (with a higher covariance) due to sensor malfunction. This 
change in noise cannot be counted normally by a nominally specified noise model. We consider this 
scenario next. For the simulated observations, we assume that the measurement noises for the first 100 
time steps is distributed as ~ A/"(0,10). Then due to the sensor malfunctioning, the measurement noises 
for the next 100 time steps is distributed as e/- ~ A((0,100). However, this knowledge is not available 
to the filter. Again the prior for unknown e/,. is taken to be a zero mean Student’s t with X = 10 4 and 
v = 5 and a typical realization of the filter mean estimate is shown in Figure (|4]). In this case as well, 
the proposed filter is doing a good job in estimating the signals. 

2) Both process and measurement noises are unknown: for this problem, although the framework 
is relatively simple (i.e., use HGM priors for both the noises as in (|25|)), we note that the inference 
problem can be difficult, at least for certain problems, due to identifiability issue. For example, if for 
any running KF (indexed by ;), there is any larger than the usual value of innovation (defined as {;<//. — 
Ck Xk\k -1 (*) — } i n (F6l)k KF cannot immediately distinguish whether this is due to any observation 

outlier or structural break in the process model. Nonetheless, filtering with unknown heavy tailed process 
and measurement noises have earlier been considered in e.g., |j9j, [16]], [271 and so we include some 


numerical studies as well. We consider the LDS as in (531, where the true (but unknown) noises are 
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distributed as Wk ~ 0.8AA(0,10)+0.2jV(0,100) and ~ 0.9A/"(0, 100)+0.17V(0,1000). For filtering, the 
state is estimated using 50 particles and the noise priors as p{wk) = T( 0 , 10 4 , 5) and p(ek) = T( 0 , 10 5 , 5) 
respectively. The true (generated) noise realizations are shown in Figure ( [5a| while the true and a typical 
estimated (mean) state along with measurements are shown in Figure ( |5b| ). Next, we consider the case of 
persistent noise outliers with the same filter settings. Flere the process noises for first 40 step is generated 
with A/"(0,100) and the rest with AA(0,10). Similarly the measurement noises for first 20 step is generated 
with Af(0, 1000) and the rest with Af(0, 100). The true (generated) noise realizations for this case are 
shown in Figure ( |5c] ). Corresponding true and a typical estimated (mean) state along with measurements 
are shown in Figure (|5d|). For both cases, we observe that the filter is performing reasonably well. 


C. Tracking a maneuvering target 

We consider a (simplified) problem of tracking a maneuvering target in two dimensional plane, where 
the state vector (xj.) consists of the Cartesian position and velocity. The noise corrupted snapshots of the 
positions {ly.) are available as the measurements. The target starts at (0, 0) and initially proceeds with 
a linear motion. This is followed by a (clockwise) coordinated turn and then again a linear motion. The 
true target trajectory and the noisy measurements are shown in Figure ([6]). We propose to track the target 


first, using a so called constant velocity (CV) model [5TJ given by 

h TI 2 


% k -(-1 — 


Xk + 


TIt 
2 42 

Th 


Vk 


(54) 


0 I 2 

Vk = \h 0] x k + e k , (55) 

where T is the sampling time in second (we use T = 1), I 2 is a two dimensional identity matrix, V}. and 
ek are the process and the measurement noise respectively. We assume that ek is generated according 
to a known distribution, given by ~ A/"(0, diag[80 2 , 80 2 ]), which is also used to generate the true 
measurements here. Flowever due to multi-modality in the track behavior, an appropriate process noise 
is difficult to specify. This point is illustrated by estimating the track using two different process noise 
standard deviations, i.e., a v = 1 and a v = 50 (assuming V). to be Gaussian). The results are shown Figure 
(7a 1 and Figure ( |7b| ) respectively. We see that o v = 1 is a better choice for the (initial) linear part of the 
trajectory, whereas a v = 50 is more suitable during the maneuvering part. Thus a fixed form of process 
noise is not appropriate for this problem. Moreover, specification of suitable noise parameters (i.e., a v 
here) is often not obvious. 

Subsequently, we try our robust noise adaptive (RBPF) filter. Here the prior for the process noise is taken 
as a symmetric Laplace ( p(vk ) ~ £(0,10 6 )) and we use 50 particles and p = 0.05. A typical estimated 
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x (m) 

Figure 6: The true positions and the measured positions (’o’) of the target (the target starts at (0, 
0)). 


track is shown in Figure (7c). We also implement an IMM filter [ 521 with (Gaussian process noise) 
(T v = 1 and a v = 50. We assume that the initial probability for the modes are equal and a mode transition 
probability matrix 7r = [0.9 0.1; 0.1 0.9]. A typical estimate of the track is shown in Figure (7di. We 
observed that both our RBPF and IMM are performing well for this problem. We also compare them over 
20 Monte Carlo runs. The statistics for the maximum absolute errors (max abs err) and the average root 
mean square error (avg. RMSE) are shown in Table [Tj From Table [Tj we see that IMM is performing slightly 



IMM 

RBPF 

max abs error 

164.95 

191.87 

avg. RMSE 

65.33 

69.71 


Table I: Error statistics over 20 Monte Carlo runs 


better, although at the cost of requiring a properly specified mode transition matrix 7r. In practice, often 
7T is unknown and estimating 7r online is an arduous task. This problem does not arise in our approach. 
Finally, there may be outliers in the measurement noise as well e.g., due to occasional sensor malfunctions. 
This problem cannot be solved trivially by tuning the parameters of the noises in a KF setup. It is hard 
for any filter to distinguish immediately whether any outlier is arising from process or measurement 
noise. To test our approach, we keep the same trajectory, but now the measurements are generated from a 
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Figure 7: Maneuvering target tracking example; true and estimated trajectory of the target 
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mixture distribution given by ~ 0.8Af(0, diag[ 80 2 , 80 2 ]) + 0.2Af(0, dm<?[300 2 , 300 2 ]). We implement 
our algorithm with priors for the process and the measurement noises to be Laplace and Students’ t 
noise respectively, given by p(vk) £(0,10 8 ) and p(ek) T( 0 , 10 s ; 4). Again we use 50 particles and 
p = 0.05. The filter is performing reasonably well in most cases. A typical estimate of the track is shown 
in Figure <|8]). 




Figure 8: (left) true trajectory with measurements corrupted by outliers, (right) estimated trajectory by 
robust noise adaptive model 


VII. Concluding remarks 

This article considers the difficult online inference problems for linear dynamic systems under a very 
realistic set-up, where (a) the noises can have non-Gaussian densities (either appearing naturally or are 
modeled to accommodate outliers) and (b) the noise parameters can be unknown and time varying. The 
corresponding non-Gaussian density is characterized in terms of the skewness and/or the heavy tails, that 
is commonly observed in many practical applications of interest. We note that unlike the heavy tails, 
inference with the skewed noise has not attracted considerable research attentions. 

For the inference task, we envisage here a new Rao-Blackwellized particle filter by leveraging on a so 
called hierarchical Gaussian model on the noises. We showed that the proposed framework is not only 
robust to any noise outlier, but also adaptive to potentially unknown and time varying noise parameters. 
However, the framework requires a valid transition kernel for the intractable state, targeted by the particle 
filter. We outlined how such kernels can be constructed, at least for certain classes of noises that cover 
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many commonly occurring (non-Gaussian) noises, where the well explored Students’ t is just a special 
case. The framework essentially runs a bank of (interacting) Kalman filters and so is relatively easy to 
understand and/or implement. We also explained how the problem can easily scale up with the dimensions 
using Rao-Blackwellization idea. The proposed algorithm here is very simple and flexible, although a bit 
computationally demanding. We subsequently carried out numerical experiments including a comparison 
to IMM. We observed empirically that the framework is doing a very good job even with only 50 
particles. The associated tasks of (offline) state smoothing, model parameters learning and also extending 
the framework for space-time inference arc left as future works. 
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